Add support for zone-redundant load balancers#5944
Conversation
7572b39 to
67685f0
Compare
67685f0 to
2e5d373
Compare
2e5d373 to
e29bc2e
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5944 +/- ##
==========================================
+ Coverage 43.85% 44.01% +0.16%
==========================================
Files 289 289
Lines 25341 25390 +49
==========================================
+ Hits 11113 11176 +63
+ Misses 13450 13441 -9
+ Partials 778 773 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@bryan-cox can you add this new functionality to the existing E2E scenario for a private cluster, which ships with/ an internal LB? E.g.: $ git diff templates/flavors/private/patches/private-lb.yaml
diff --git a/templates/flavors/private/patches/private-lb.yaml b/templates/flavors/private/patches/private-lb.yaml
index 76e1539df..a2933e299 100644
--- a/templates/flavors/private/patches/private-lb.yaml
+++ b/templates/flavors/private/patches/private-lb.yaml
@@ -7,6 +7,10 @@ spec:
apiServerLB:
name: ${CLUSTER_NAME}-internal-lb
type: Internal
+ availabilityZones:
+ - "1"
+ - "2"
+ - "3"
nodeOutboundLB:
frontendIPsCount: 1
controlPlaneOutboundLB:After you apply the above changes to the template partial above, render updated templates w/ |
|
/test pull-cluster-api-provider-azure-e2e-optional |
e29bc2e to
3b77777
Compare
|
/test pull-cluster-api-provider-azure-e2e-optional |
|
/retest |
3b77777 to
6fa7de7
Compare
|
/test pull-cluster-api-provider-azure-e2e-optional |
36f5c8f to
e69a17e
Compare
|
Attempting to get the PR back to its stable state before attempting to address #5944 (comment) again. |
|
/test pull-cluster-api-provider-azure-e2e-optional |
|
/test pull-cluster-api-provider-azure-e2e |
|
/test pull-cluster-api-provider-azure-e2e-optional |
|
Retesting - the previous run failed due to an Azure DNS infrastructure flake. The management cluster API server DNS ( /test pull-cluster-api-provider-azure-e2e-optional |
|
/retest |
|
| PR | Author | Change | apiversion-upgrade results |
|---|---|---|---|
| #6262 | @mboersma | Bump CAPI to v1.13.1 | 5 consecutive failures before passing on 6th run |
| #6287 | @willie-yao | Update test metadata/versions | 1 fail, 2 passes |
| #5944 | this PR | Zone-redundant LB | 2 fails, 2 passes |
PR #6262 is a simple dependency bump with no test-affecting changes — the same commit failed 5 times in a row before passing with zero code changes.
All failures share the same pattern: azureserviceoperator-controller-manager deployment timeout after clusterctl upgrade apply, followed by management cluster API server becoming unreachable (DNS resolution failures, TCP timeouts, HTTP/2 connection drops).
/retest pull-cluster-api-provider-azure-apiversion-upgrade
|
/test pull-cluster-api-provider-azure-apiversion-upgrade |
|
@bryan-cox Is this ready for review? |
|
@willie-yao hey 👋🏻 - yeah it is. I forgot to check it after running the test again. |
willie-yao
left a comment
There was a problem hiding this comment.
Great work on this! Just a few comments from my end, otherwise looks good.
| // For public load balancers, this should be set on the associated public IP addresses instead. | ||
| // +optional | ||
| // +listType=set | ||
| // +kubebuilder:validation:MaxItems=3 |
There was a problem hiding this comment.
I think it could help to add some more validation of availability zones, maybe something like +kubebuilder:validation:Pattern=^[1-3]$.
There was a problem hiding this comment.
Good idea — added +kubebuilder:validation:items:Pattern=^[1-3]$`` so invalid zone values are rejected at admission time.
| } | ||
|
|
||
| if c.Spec.NetworkSpec.NodeOutboundLB != nil && old.Spec.NetworkSpec.NodeOutboundLB != nil { | ||
| if !webhookutils.EnsureStringSlicesAreEquivalent( |
There was a problem hiding this comment.
It looks like the controlPlaneOutboundLB uses ValidateImmutable (which is order-sensitive) instead of EnsureStringSlicesAreEquivalent. I think it's probably better to keep them both consistent but I'm not sure which one is better as the behavior is different.
There was a problem hiding this comment.
Agreed on keeping them consistent. Added an explicit EnsureStringSlicesAreEquivalent check for controlPlaneOutboundLB.AvailabilityZones so it matches the order-insensitive behavior used for apiServerLB and nodeOutboundLB. Since the field has +listType=set, order shouldn't matter. Added a test case for it too.
Add AvailabilityZones to LoadBalancerSpec for zone-redundant load balancer support. For internal LBs, zones are applied to frontend IP configurations. For public LBs, zones are applied to the associated public IP addresses. Includes webhook immutability validation for APIServerLB and NodeOutboundLB zones, and CRD regeneration.
Add documentation page covering zone-redundant load balancer configuration for both internal and public LBs, including examples, migration guidance, and troubleshooting.
Configure availability zones ["1", "2", "3"] on the internal API server load balancer in the private cluster flavor template and CI overlay.
Add unit tests for webhook immutability validation, PublicIPSpecs with LB availability zones, and load balancer frontend IP zone configuration. Add E2E verification of zone-redundant internal LB into the private cluster test.
6a57e82 to
2f7c2a7
Compare
|
/retest |
|
@bryan-cox: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test pull-cluster-api-provider-azure-apiversion-upgrade |
|
/label tide/merge-method-squash |
willie-yao
left a comment
There was a problem hiding this comment.
Great work on this! 🚀
/lgtm
/approve
|
LGTM label has been added. DetailsGit tree hash: 5f134f95e864934eff0d2a3cc6e0a085a9f1339a |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: willie-yao The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR implements support for configuring availability zones on Azure load balancers to enable zone-redundant configurations for high availability.
Azure load balancers can be configured as zone-redundant to ensure high availability across multiple availability zones within a region. This feature allows users to specify availability zones (1, 2, 3) on load balancers, which are then set on the frontend IP configurations.
Key changes:
AvailabilityZonesfield toLoadBalancerSpecAPIWhich issue(s) this PR fixes:
Fixes #5709
Special notes for your reviewer:
Note
This PR was generated with the assistance of AI tooling (Claude Code).
This implementation follows Azure's zone redundancy model:
Zone-redundant LB verification is integrated into the private cluster E2E test per reviewer feedback, rather than as a standalone test.
TODOs:
Release note: